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Abstract 

Background: Copy number variations (CNVs) and DNA sequence alterations affecting specific neuronal genes are 
established risk factors for Autism Spectrum Disorder (ASD). In what is largely considered a genetic condition, so far, 
these mutations account for -20% of individuals having an ASD diagnosis. However, non-coding genomic 
sequence also contains functional elements introducing additional disease risk loci for investigation. 

Results: We have performed genome-wide analyses and identified rare inherited CNVs affecting non-genic intervals 
in 41 of 1491 (3%) of ASD cases examined. Examples of such intergenic CNV regions include 16q21 and 2pl6.3 near 
known ASD risk genes CDH8 and NRXNl respectively, as well as novel loci contiguous with ZHX2, MOCSl, LRRC4C, 
SEMA3C, and other genes. 

Conclusions: Rare variants in intergenic regions may implicate new risk loci and genes in ASD and also present 
useful data for comparison with coming whole genome sequence datasets. 
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Background 

Newer genomic technologies like high-resolution micro- 
arrays and next generation exome sequencing have en- 
abled the identification of many clinically relevant genetic 
variants for both Mendelian and complex disorders. Yet 
for many conditions the identified genes account for only 
a proportion of heritability. This observation coupled with 
the recognition of the functional relevance of non-genic 
regions [1] target these genomic segments as candidates 
for investigation for a role in disease. 

ASD encompasses a range of neurodevelopmental disor- 
ders characterised by social impairment, communication 
difficulties and restricted, repetitive behavioural patterns. 
ASD, which is clinically and genetically heterogeneous, 
demonstrates high heritability, familial clustering and ~4:1 
male to female bias. While there has been progress iden- 
tifying risk genes, most are still unknown [2]. Analyses 
of rare (<1% population frequency) CNVs, insertions 
and deletions (indels) and point mutations have most 
convincingly identified synaptic genes such as members 
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of the Neuroligin {NLGN3, NLGN4) [3], Neurexin 
{NRXNl [4], NRXN2 [5], NRXN3 [6]), SHANK {SHANKl 
[7], SHANK2 [8], SHANKS [9]) families and Gephyrin 
[10] as highly-penetrant risk loci [2]. ASD subjects with 
multiple genetic risk factors for ASD and associated med- 
ical conditions are also known [11]. In addition, there are 
a few examples of mutations in ASD cases identified in 
non-genic segments of DNA [12] and non-coding RNAs 
[13]. Similar findings are even better documented in 
studies of intellectual disability [14,15], which is observed 
in -40% of cases of ASD. Focusing on the intergenic inter- 
vals of the genome, we performed a systematic genome- 
wide investigation to identify rare CNVs enriched in cases 
compared with controls [16] to identify known and novel 
ASD susceptibility loci. 

Methods 

A collection of 1491 unrelated ASD cases were genotyped 
using either the lUumina IM (993) or the Affymetrix SNP 
6.0 platforms (498). The ASD subjects, all diagnosed using 
gold-standard instruments including Autism Diagnostic 
Interview and Autism Diagnostic Observation Schedule, 
are described elsewhere [16,17]. Informed written consent 
was obtained from all participants, as approved by the 
Research Ethics Boards at The Hospital for Sick Children 



© 201 3 Walker and Scherer; licensee BioMed Central Ltd. This Is an Open Access article distributed under the ternns of the 
Creative Commons Attribution License {http://creativecommons.Org/licenses/by/2.0), v^/hich permits unrestricted use, 
distribution, and reproduction in any medium, provided the original work is properly cited. 



Walker and Scherer BMC Genomics 2013, 14:499 
http://www.biomedcentral.com/1471-2164/14/499 



Page 2 of 6 



Chr2:49900163..52600001 

19.9'W 'sUf 50.1m " " 5oiM"" mIm'" 

RefSeq Genes 

NRXN1INM_138735 



NRXN1INM_001135659 

♦II — ' — KH H nnii---fl— 

mRNA 

AB011150 



losses 



— nrm-— -tt— 



AK1 27244 




8-3394-003 

8-14144-2420 1-0045-004 



TH K 

1 -0449-003 



gams 



B 



chrl 1 :40000000..42500001 

RefSeq Genes 

LRRC4CINM_020929 



mRNA 

BC041 374 




LOCI 00507205INR_038309 

M — n 



AY726565 

«nrri 



Human ESTs 



AI285993 AW305348 HY313563 

h-— H rrn i 1 

BX1 09663 DB527082 

I n r-H 

BI756420 DB461660 

h-^fi^ 1 — rt 

AA453264 

n 

AA453365 



losses 



gains 



2-0272-003 
SK01 67-003 
3-0208-000 

8-14208-3350 



8-14032-600 8-3276-003 
2-0286-003 



Chr8:34900000..35800001 

a. 3..fe. ' ■ ' ' 

RefSeq Genes 
Human ESTs 

BU853854 

losses 

8-1424 3-3670 

3-0044 -000 

8-1418 1-2940 

3-0300-000 



UNC5DINMJ80872 



-fl — ^-'"^Tl — irm--^ 



D 



Chr4:65830000..67200001 



-ar 



RefSeq Genes 

EPHA5INM_1 82472 

«rm — m 1 

Human ESTs 



losses 



LOC100144602INR_034138 
DA797949 



DB072214 

n 

HY001885 
8-14208-3350 
8-14186-3050 
1-0138-004 



MIR548AJ2INR_039674 



— m 

HY1 85285 

nn 



2-0082-004 
1-0455-003 



Figure 1 (See legend on next page.) 
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(See figure on previous page.) 

Figure 1 Genome browser views of ASD specific CNVs at A) 2p16.3 B) 11 pi 2 C) 8p12 and D) 4q13.1. in each case, representative isoforms 
of l<nown RefSeq genes, mRNA and/or Expressed Sequence Tags are sliown. Deletions and duplications are represented by red and blue bars, 
respectively. In Figure 1A) a dashed line indicates a diploid region located between two adjacent deletions in the same individual. Additional 
browser views from other loci shown in Table 1 are included in Additional file 1 A-J. In all cases where parental DNA was available, the CNVs 
shown were found to be inherited. Additional case SK01 57-003 found in Marshall et al. [19]. 



and McMaster University. For controls, 1287 samples from 
the SAGE cohort were genotyped on with the lUumina IM 
and 1234 samples from the Ottawa Heart Institute (OHI) 
and 1123 from the POPGEN collections were genotyped 
on the Affymetrbc SNP 6.0. CNV discovery was performed 
using previously described pipelines [16-18]. Three CNV 
detection tools were used for each platform (Birdsuite, 
iPattern and Genotyping Console for Affymetrix 6.0 
and iPattern, QuantiSNP & PennCNV for lUumina 1 M). 
A subset of CNVs in both cases and controls were consid- 
ered rare if they were present in <1% of the overall dataset 
and these were further analysed if they failed to intersect 
or fall within a known gene (according to the NCBI Refer- 
ence Sequence (RefSeq), August 2011). Rare genie CNVs 
identified from these data have been reported previ- 
ously and from these data approximately 10% of cases 
carry a de novo or rare inherited CNV thought to 
contribute to ASD in that individual [16,17,19,20]. All 
CNVs discussed were validated where DNA was avail- 
able using independent laboratory methods such as 
long range or quantitative PCR and the mode of inher- 
itance determined (Additional files 1 and 2). 

Results and discussion 

Microarray data from a cohort of 1491 unrelated ASD 
probands were analysed for rare copy number variants 
as described previously [16,17] and CNVs falling outside 
of known coding sequence were identified. A total of 
212 non-coding genomic regions were determined as 
harboring overlapping CNVs in two or more unrelated 
ASD cases that were absent in control samples. Each 
region was examined for plausible biological function by 
comparison with multiple databases. Data was collated 
for evidence of expressed sequences from mRNA or EST 
data at GenBank or evolutionary conservation as well as 
functional predictions from the VISTA enhancer browser 
(http://enhancer.lbl.gov/) and Rfam (http://rfam.sanger.ac. 
uk/). The Database of Genomic Variants (http://dgvbeta. 
tcag.ca/dgv/app/home) was used to eliminate additional 
regions as non-ASD specific CNVs and regions with >80% 
masked as repetitive sequences were removed. Loci were 
also prioritised as being of potential clinical significance 
in ASD due to proximity to genes considered known or 
candidate ASD risk genes [17]. 

Fifteen intergenic regions emerged as plausible candi- 
date ASD risk loci and in all instances the defining CNV 
events were inherited. In one of these regions, an 



additional case (SK0167-003) was found with an overlap- 
ping CNV described by Marshall et al. (2008) [19] 
(Table 1, Figure 1 and Additional files 1 and 2). In 14 of 
15, the intergenic interval identified has not been de- 
scribed before and in three regions the CNV 
neighboured a known ASD gene, namely, CDH8 [21], 
C3orf58 [22] and NRXNl [4]. In the case of the NRXNl 
gene, upstream CNVs found in five individuals impact 
the same mRNA (AK127244) reported elsewhere with a 
CNV in a family with ASD (Table 1, Figure lA) [23]. Ex- 
amples of other intergenic CNVs identified highlight re- 
gions at 8q24.12 upstream of ZHX2, 6p21.2 upstream of 
MOCSl, llpl2 upstream of LRRC4C (Figure IB) and 
7q21.11 upstream of SEMA3C, as putative novel ASD 
rearrangements. In one case (8-14208-3350), deletions 
were identified at three separate loci; 4ql3.1 upstream 
of EPHA5, llpl4.3 upstream of LUZP2 and llpl2 up- 
stream of LRRC4C and another case (3-0496-003) car- 
ried a 46, XXY sex chromosome imbalance. Other 
CNVs found in these 41 cases are shown in Additional 
file 3 and any or all of these may be contributing to the 
genetic load for ASD [11,17]. Interestingly, all the 
CNVs identified through our analysis are inherited 
events. The significance of this observation is still to be 
determined but suggests incomplete and/or variable 
penetrance of phenotype, which is something often ob- 
served in ASD [6,7,17]. 

The mechanism of action of these rare CNVs in the 
pathogenesis of ASD could be (i) through altering the 
necessary copy number or positional context of key DNA 
sequence elements required for regulating the proper 
expression of nearby genes [1], (ii) affecting still undis- 
covered genes or non-coding RNAs residing in the CNV 
regions and (iii) disrupting uncharacterized isoforms of 
the adjacent annotated genes. In the first scenario, we find 
CNVs both upstream (e.g. UNC5D (Figure IC), MOCSl, 
ASTN2, SEMA3C, ZHX2, LUZP2, CDH8) and down- 
stream (CSorfSS, RXRA, MRGPRD) of known ASD risk 
genes and putative novel loci. For at least three regions 
(4ql3.1, 6p21.2 and llpl2 (shown in Figure ID, Additional 
file IC and Figure IB respectively)), our CNV mapping 
data in fact identify two distinct clusters of CNVs at the 
same locus, all overlapping spliced ESTs and thus with a 
possible regulatory role. Secondly, three independent 
CNV deletions interrupting a collection of spliced ex- 
pressed sequenced tags approximately 330 kb proximal 
to EPHAS highlight a potentially newly discovered ASD 
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Table 1 ASD specific CNVs in intergenic regions 


Locus Gene 


Sample 


CNV 


Start 


End 


Size 


Furthest distance Bin 
from gene 


2pl6.3 WfiX/V/ AK127244 mRNA 1-0045-004 


OSS 


51405882 


51524684 


118802 


1124 ii 




8-3394-003 


OSS 


51439897 


51479683 


39786 






8-3394-003 


bss 


51157414 


51189362 


31948 






8-14144-2420 


OSS 


51157414 


51225851 


68437 






1 -0496-003 


gain 


52220120 


52238172 


18052 






1 -0449-003 


bss 


52237072 


52253660 


16588 




3p22.3 ARPP21 


2-1213-003 


bss 


34984049 


35102773 


118724 


563 ii 




3-0100-000 


gain 


35086691 


35094736 


8045 




3q24 C3orf58 ZICl. ZIC4 


1 -0007-003 


bss 


146168760 


146934953 


766193 


1383 1955,1979 i 




8-3093-004 


bss 


146575437 


146631141 


55704 




4q13.1 EPHA5 


8-14208-3350 


bss 


66505324 


66633530 


1 28206 


840 i 




8-14186-3050 


bss 


66515708 


66633530 


1 1 7822 






1-0138-004 


bss 


66515708 


66633530 


1 1 7822 






2-0082-004 


bss 


67045815 


67134170 


88355 






1 -0455-003 


bss 


67058506 


67075558 


17052 




6p21.2 M0C51 


3-0139-000 


gain 


40021898 


40078515 


56617 


168 iorii 




2-0139-003 


gain 


40023327 


40062155 


38828 






1-0381-003 


bss 


40174188 


40209324 


35136 






2-1368-003 


bss 


40174188 


40210694 


36506 




7q21.11 SEMA3C 


8-6258-03 


bss 


80431202 


80512022 


80820 


96 i 




1 -0345-005 


bss 


80482597 


80517630 


35033 




8p12 UNC5DNRG1 


8-14243-3670 


bss 


34923482 


34956067 


32585 


256 2183 i 




3-0044-000 


bss 


34923482 


34956067 


32585 






3-0300-000 


bss 


34925149 


34957854 


32705 






8-14181-2940 


bss 


34923482 


34956067 


32585 




8q24.13 ZHX2 


8-3317-003 


gain 


123572785 


123625681 


52896 


237 i or ii 




3-0186-000 


bss 


123583028 


123639417 


56389 




9q33.1 ASTN2 


8-3055-004 


bss 


119254497 


119374796 


1 20299 


98 i 




3-0115-000 


bss 


119314967 


119319559 


4592 




9q34.2 OLFMl RXRA 


2-1272-003 


gain 


136479329 


136604233 


1 24904 


508 8 i 




2-1189-003 


gain 


136480334 


136598491 


118157 




11 pi 4.3 LUZP2 


8-14175-2820 


bss 


24177612 


24316053 


138441 


160 iorii 




8-14059-1020 


bss 


24262511 


24303132 


40621 






8-14208-3350 


bss 


24262511 


24303132 


40621 




11 pi 2 LRRC4C 


8-14208-3350 


gain 


40304880 


40703298 


398418 


196 iii 




2-0272-003 


bss 


40379668 


40550356 


1 70688 






SKOl 67-003 


bss 


40417554 


40610400 


1 92846 






3-0208-000 


bss 


40468058 


40492541 


24483 




11 pi 2 /.RRC4C 


8-14032-600 


bss 


41990280 


42021250 


30970 


1738 iorii 




8-3276-003 


bss 


42243624 


42279094 


35470 






2-0286-003 


bss 


42243624 


42279094 


35470 




llq13.2 MRGPRD 


4-0023-003 


bss 


68486121 


68493638 


7517 


10 i 




2-1075-003 


bss 


68486121 


68500238 


14117 




16q21 CDHS 


8-14251-3750 


bss 


61650435 


61787984 


137549 


1030 iorii 




2-1 1 75-003 


bss 


61658675 


61755232 


96557 




Location and size of all CNVs discovered are listed with the proposed associated candidate gene. Bin denotes possible mechanism of action by i} altering sequence elements 
required for regulating expression of neighboring genes ii) affecting undiscovered genes or non-coding RNAs iii) disrupting uncharacterised isoforms of adjacent genes. Genome 
browser views of all loci are shown in Figure 1 and Additional file 1 . All pedigrees are shown in Additional file 2. Additional sample SK01 67-003 identified in reference [19]. 
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risk gene (Figure ID). Finally, longer isoforms of 
LRRC4C likely exist given the discovery of mRNAs 
DQ084201 and DQ084202. There are, of course, other 
functional DNA elements or modifications that need to 
be considered [24] as the mapping resolution increases. 

Conclusions 

Given the challenges faced in interpreting the clinical 
significance of multitudes of genetic variants found in for 
example, whole genome sequencing [25], accruing evi- 
dence across multiple studies will advocate loci outside of 
known genes or other regulatory elements for further 
study, particularly for rare variants. In this light, these data 
provide a useful resource for comparison as new data sets 
of both CNVs and nucleotide-level variants become avail- 
able to help fine-map additional discover new ASD risk 
loci. This general research strategy can also be applied to 
other disease gene studies. 

Additional files 



Additional file 1: Genome Browser views of loci with ASD specific 
CNVs. 

Additional file 2: Pedigree structure for all families listed in 
Table 1. 

Additional file 3: Table of all rare CNVs detected in the individuals 
described herein. 
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